Computing PageRank in a Distributed Internet Search System
نویسندگان
چکیده
Existing Internet search engines use web crawlers to download data from the Web. Page quality is measured on central servers, where user queries are also processed. This paper argues that using crawlers has a list of disadvantages. Most importantly, crawlers do not scale. Even Google, the leading search engine, indexes less than 1% of the entire Web. This paper proposes a distributed search engine framework, in which every web server answers queries over its own data. Results from multiple web servers will be merged to generate a ranked hyperlink list on the submitting server. This paper presents a series of algorithms that compute PageRank in such framework. The preliminary experiments on a real data set demonstrate that the system achieves comparable accuracy on PageRank vectors to Google’s wellknown PageRank algorithm and, therefore, high quality of query results.
منابع مشابه
The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine
Nowadays, the size of the Internet is experiencing rapid growth. As of December 2014, the number of global Internet websites has more than 1 billion and all kinds of information resources are integrated together on the Internet , however,the search engine is to be a necessary tool for all users to retrieve useful information from vast amounts of web data. Generally speaking, a complete search e...
متن کاملDistributed Pagerank for P2P Systems
This paper defines and describes a fully distributed implementation of Google’s highly effective Pagerank algorithm, for “peer to peer”(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update ...
متن کاملA Framework for Evaluating Cloud Computing User’s Satisfaction in Information Technology Management
Cloud computing is a new discussion in enterprise IT. It has already become popular in terms of distributed technology in some companies. It enables managers to setup and run the intended businesses by avoiding excessive spending on computers, software and hiring expert staff, which proves to be cost effective. Cloud computing also helps users pay for the IT services without spending massive am...
متن کاملFast Distributed PageRank Computation
Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google’s search engine). In distributed computing alone, PageRank vectors, or more generally random walk based quantities have been used for several different applications ran...
متن کاملKnowing Where to Search: Personalized Search Strategies for Peers in P2P Networks
Optimizing and focusing search and results ranking in P2P networks becomes more and more important with the increasing size of these networks. Even though a few approaches have already started to investigate the computation of PageRank-like values in P2P environments, none so far has investigated how personalization could be added to it. This paper tackles the problem of distributedly computing...
متن کامل